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ABSTRACT 

The predictive analysis approach to adaptive testing 
originated in the idea of statistical predictive analysis suggested 
by J. Aitchison and I.R. Dunsmore (1975). The adaptive testing model 
proposed is based on parameter-free predictive distribution. 
Aitchison and Dunsmore define statistical prediction analysis as the 
use of data obtained from an informative experiment in the past to 
make some reasonable statement about the outcome of the future 
experiment. Use of the approach's predictive density function and 
item selection procedure and terminating criteria is discussed. A 
small-scale exploration study compared the approach with A. Wald's 
sequential probability retio test (1947), M. F. Lord's flexi-level 
test (1971), R. J. Owen's Bayesian strategy (1975), and Fc C. 
Samejima's maximum likelihood strategy (1977). The various approaches 
could not be placed on equal base in terms of data used. Results 
indicate that: (1) final predictive probabilities were significantly 
correlated with total scores; (2) the predicted adaptive testing 
performed better than sequential probability testing and almost as 
well as the Bayesian strategy in the area of mastery classification; 
(3) the benefit of adaptive testing could not be demonstrated in the 
area of the number of test items required; and (4) there was no 
effect on the number of misclassif ications of students when different 
priors were used in predictive testing. Three tables are included. 
(TJH) 
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INTRODT mON 



With the recent developments in iten response theory and computer 
technology, the conditions required for the implementation of 
tailored testing into p.jcticc seem mature. Yet the implementations 
of tailored testing arc confined to the situations where only large 
sample sizes can be obtained to calibrate items because item 
response theory requires large sample sizes for parameter 
estimations. 

The purpose of the study was to develop an adaptive testing 
model based on parameter-free predictive distribution. In addition 
to the derivation of a predictive distribution, iiem selection strategics 
and terminating criteria were obtained. The feasibility of the model 
was also investigated by comparing its performance with the 
performance of Lord's flexilevel test (Lord, 1971), Wald's sequential 
probability ratio test (Sprt, Wald. 1947), Owen's Baycsian item 
selection strategy (Owen, 1975), and maximum likelihood item 
selection strategy (Samejima, 1977). The performance in adaptive 
testing was simulated using actual data obtained from a paper and 
pencil test. 

TIIE MODFL 

The predictive analysis approach to adaptive testing is originated 
from the idea of statistical predictive analysis suggested by Aitchison 
and Dunsmore (1975). The statistical predictive analysis is composed 
of two parts: Informative experiment E and future experiment F. An 
informative experiment E is an experiment which is performed in the 
past and its typical outcome is denoted by x. In the same manner, a 
future experiment F is an experiment which is carried out in the 
future and its typical outcome its denoted by y. 

Aitchison and Dunsmore define statistical prediction analysis as 
the use of data, which arc obtained from an informative experiment 
E in the past, to make some reasonable statement about the outcome 
of the future experiment F. This analysis contains two assumptions. 

a) The probability distributions which describes informative 
experiment E and the future experiment F have the saiiie unknown 
parameter space (trait). 

b) For a given trail, the experimonts 1" and F arc independent 

Predictive Density Function 

Let 0 be the unknown trait to be mca^sured and f(0) be the prior 
density of o. Let U.lo) be the probability density functun. of x. which 
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(1) 



\s the typical outcome of the informative experiment F. Hence ihe 
posterior dcnswy is nropo-tional to f(v'x)dC f(xlo)f(o). 

The predictive densK, function of y is obtained from the 
posterior distribution. Let f(.lo) be the density function of future 
outcome y. By taking into consideration of Bayesian approach to 
predictive problems, the distribution o*" future outcome y given 
informative' experiment x is defined by 

f(ylx)=: / f(ylo)f(olx)do 
which is called the predictive density function. 

In the above derivation, it is assumed chat x and y are 
continuous variables. The same derivation technique is applicable for 
discrete variables. As seen, the predictive density function does not 
involve parameter 0 (trait). Yet it is possible to make inference about 
the magnitude of the future observations for the same trail. 

Itgm Selection Prnrr^^iry 

In order to find the most appropriate item to administer to an 
examinee, two predictive probability functions have to be specified' 
One with a prior belief describes the examinee's ability level and the 
other with a prior belief represents the difficulty level of the item. 

If a beta distribution is used to represent the prior belief of an 
examinees ability level and a binomial distribution is specified for 
an informative function, the following predictive distribution (beta- 
binomial) for an ability level is obtained by appliying the steps 
which are described in the previous section 

N r(oc+ fi) r(y^) r(N+fi-y) 

fa(ylx)=( ) , v=0. 1 N (II 

y r((v) r(B) r(N+o(+B) 

where cx =x+g, fi=n+h.x. The parameters A prior distribution arc g>0 
and h>0, where g and h arc called a legation parameter and a scale 
paranieter. respectively. The sample size of an infori.iative 
experiment for the number of items aneady administered is n and 
for the number of items tc be administered in the future is N. The 
numbers of correct an.swers in the pa.'* experiment and in the future 
experiment arc represented by x and y respectively. 

Another predictive distribution based on the prior belief of 
item difficulty b can also be represented by the formula |l) except 
the values for the location and scale parameters oi the prior 
distribution will be different. This predictive distribution is 
designated as fb(ytx). To obtain the probability of answering next 
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item correci. given iicni difficulty b and x number of itcm^ correct in 
the past, the proportionality of f(y=llb,x) to f(blx,y=l)fb(y=llx) is 
used, where f(b1x,y=l) is the posterior probability of item difficulty 
given past and future information of an examinee. To find the most 
appropriate item to administer to an examinee the following criterion 
is considered 

min lfa(y=llx)-(l-f(yanb,x)!. 
b 

The above criterion is constructed by considering the almost perfect 
coaelation between item diificnlty b and f(y=llb,x), and .Iso the 
negative correlation between fa(y=llx) and f(y=llb,x). According to 
the above criteria, the most appropriate item to be administered is 
the one whose item difficulty matches to his/her ability level. 

Teminatinp Tritftna 

After a minimum number of items is administered, a decision has to 

be made by choosing one of the actions 
ai = examinee is a non-master, 
a2 = no decision can be made, continue testing, 
a3 = examinee is a master. 

To decide when to terminate the testing, a predictive 
distribution, which is bssed on prior belief of an examinee's ability 
level and a utility function are employed. The proportions of the 
non-mastery (Rl), undecided (R2). and mastery (R3) regions are 
designated in advance such as ($'1,^2.63. respectively. To make a 
decision, the likely number of correct answers in future which may 
lie in the regions are determined as follows: ki=6l x N, k2=62 x N, 
k3=63 X N, where N is the number of items to be administered in 
future and ki, k2. and k3 are assumed to be the closest integer 
values. If the model predicts an examinee can answer only 0, 1, or 
upto ki-1 items correct out of the remaimng N items, the examinee 
IS placed in the non-mastery region (Rl). If the model predicts an 
examinee can answer ki through N-k3 items correct, the examinee is 
placed in the undecided region (R2)* If more than N-k3 itenis correct 
is predicted, the examinee is placed in the mastery region (R3). Thus, 
the sum of the proportions of the regions equals to 1, 6 1 •ffi2 

53 = 1, and k|+k2+k3=N. To figure out what are the proper values for 
Si. 62. ^3* one may consider the proportions of answering the 
remaining N items coaect. If 6|,i2'^3 are 0,3, 0.4, 0.3, respectively, 
it implies that if the examinee can answer only less ihnn O..*^ of the 
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remaining items correci. he/she will be placed in the non-masterv 
region (R|). If the examinee can answer 0 3 or more but less than" or 
equal to 0.7. which is the sum of | and 2. he/she should be placed 
in the undecided region (R2). 

Let us define utility function for action aj: 
> if yeRi 

u(ai, n) = { 

0 otherwise \^\ 2 3 

Then, the terminating cri crion is defined as the choice of the action 
ai wnich gives maximum jtilily, max( j;, u(ai, n)f(ylx), i=l, 2, 3) 

y€Ri 

(/Vitchison and Dunsmorc. 1975, Ch.8). After simplifying the above 
criteria, the following is obtained 

max(pi=:i f(ylx) p2=I f(ylx), p3= I f(ylx) or p3= l-pi -no) 
y^Rl y€R2 y€R3 

Then, the decision is simf.ly the choice of the maximum probability 
pi which IS calculated over the region Rj. 

To .make a decision for any examinee that is still in undecided 
region (R2) after reaching the maximum number of items to be 
administered three approaches were employed in this study. These 
three approaches are listed in order of preference, (a) comparisons of 
final PI and p3 values, (b) comparisons of his/her final predictive 
probability with others having the similar predictive probabilities, 
and (c) comparisons of his/her number of correct answers in 
percentage with those of other examinees. 

COMPAR!9nN5; W^f nTj4pp, 5;TRATr:r,IF<; 

To investigate the feasibility of the predictive ;idaptivc testing 
strategy, a small scale exploration study was made to compare the 
performance of the strategy with the performance of Lord's flexilevcl 
test (Lord, 1971), Wald's sequential probability ratio test (Wald, 
1947), Owen's Bayesian strategy (Owen, 1975), and maximum 
likelihood strategy (Samejima, 1977). Since the strategies could not 
be placed on equal base in terms of data employed, this was only a 
gross comparison to assess whether the predictive testing strategy is 
worthy of further investigations. 

The comparison was made in terms of the answers to the 
following (luestions. (a) What is the rchUJuiLship between to!:ii lest 
scores and the predicted prohabilily. estiin.ited ability, or proportion 



6 



of correct obtained from the adaptive tests? (b) What are the 
proportion of misclassification into mastery or non-mastcry by the 
adaptive tests in comparing with an arbitrary cut-off score of the 
total test? (c) What are the minimum number of items required for 
the adaptive testing decisions? (d) For strategies involving prior, 
what are the effect of different prior on the predictive or estimation 
of ability? 



Table I: Specification of strategies used in the comparison 



Predictive Flexilevel 


Wald 


Bayesian 


Maximusr. 


Minimum 7 


7 


7 


7 


7 


no. of items 










Maximum 23 


23 


23 


23 


23 


no. of items 










Difficulty Traditional 


Traditional NA 


LOGIST 


LOGISTS 


from samp 


e from 


sample 






Disrimina. NA 


NA 


NA 


LOGIST 


LOGISTS 


Guessing NA 


NA 


NA 




LOGISTS 


Beta distr- 






Normal disrr. 


Prior High g=2, h=l 


NA 


KA 


M=:.5.S.D=I 


NA 


Prior Mid. gr2, h=2 


NA 


NA 


M=0. S.D= 


NA 


Prior Low g=l, h=2 


NA 


NA 


M=-.5 S.D= 


I NA 


Masicrv regions 






Ability e<t Abilitv est. 


Mastery 63=0.3 


NA 


0.8. 0.7 


0.65 or 


0.65 or 


Undecided62=0.4 






higher 


higher 


NA 




Non*mast. jl=0.3 


NA 


0.5. 0.3 


Below 0.65 


Below 0.65 


« NA 


NA 


0.05 


NA 


NA 


C NA 


NA 


0.05 


NA 


NA 


Termination Max(pi .p3 ) NA 




Error viir 


Test info 


Criteria 






0 08 


12 



(3) 



The data for this comparison were obtained from Form A of 
college math placement test. This 45.item test was administered 800 
students registered for math courses. It was a part of field testing of 
math placement test items for developing an computerized adaptive 
placement test based on item response theory. Thus, the estimates of 
parameters, difficulty, discriminating, and guessing, are available for 
these 45 items (Hsu & Shermis. 1987). 

In order to compare the performance of five strategies, the 
adaptive portion of the study was simulated. In other word:, the 
items were administered one at a time. But the response for each 
item is based on the examinee's response on the answer sheet. 
Response data from 50 subjets were randomly selected for this 
comparison. 

Specifications for each strategy used in the comparisons are 
summarized in Table I. Several notations are in order. The maximum 
number of items administered was set at 23 because for a 45-item 
test. 23 items were required by the flexilevel strategy. Three 
different priors were used for the predictive strategy and Bayesian 
strategy. Although they are based on different distributions, they are 
approximately equivalent. Two set of mastery and non-mastery 
criteria (0,8, 0.5 and 0.7. 0.3) were employed for Wald's Sprt. They 
arc identified as Sprti and Sprt2. respectively. Since these 'cts of 
criteria cannot be related to the mastery regions specified for the 
predictive strategy, the comparisons between these strategies should 
be interpreted with caution. 

Simulation results for all strategies were compared with the 
results of the complete test in Table 2. Students whose complete test 
scores in percentage were 65 or more were assigned to the '^astery 
group. Eleven students were classified into the mastery group and 39 
were classified into the non«mastery group. By assuming that all 50 
students were in the low ability level, for the prciictive strategy, on 
the average 20 items were admirisiered. As a reSL»li 9 out II 
students remained the mastery group. For medium ability 
assumption, there were 7 master students. The average number of 
items administered were 2L If all the students were assumed to 
have high ability level, 13 of them were assigned into the mastery 
group. However, only 10 of the 13 students were correctly classified. 
On the average 20 items were administered under the assumption of 
high ability level. The number of items used in testing all three 
different priors varied between 23 and 7, 

In self-scoring flexilevel lest, s;udenls were as.^igned into 
mastery categories based on the percent of correct answering the 23 
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ilcms administered. Any student whose score In porocniagc was 
abov!5 65 was assigned into the mastery category. 

Wald's sequential ratio test were used twice with dilYcrcnt 
mastery and non-mastery proportions. Twelve students were 
assigned into the mastery category with criteria of 0.80 and 0.50 
(Sprti). of which. 8 were correctly classified. Among 25 students m 
the category of mastery when using 0.70 and 0.30 (Sprt2)» only 10 
were correctly classified. The average number of items'adminisiercl 
was 12 for Sprti and 10 for Sprt2. The number of items used for 
these two tests ranged between 21 and 3. 

Table 2: Comparisons of average number of ilems administered 
and classification of students according to the total test 
\corcs 



Adaptive 

Testing 

Strategics 



Average Number 
(and sL devJ 
of the Items 
Administered 



Na of Students 
Correct^ 
Classified 
as Master. 
Non-Master 



No. of Students 
Misclassified as Master 
Non-master (phh;oeffJ 



Complete test 


45 


(0.00) 


IK 39 


0. 0 (1.00) 


Pred.(Low) 
Pred.(Med) 
Pred.(High) 


19.86 
21.40 
20.36 


(5.35) 
(4.84) 
(5.17) 


11. 39 
7. 43 
13. 37 


2. 2 {(^ 77) 
0. 4 (0.76) 

3. 1 (0.79) 


Flex. 


23.00 


(0.00) 


8. 42 


0. 3 (0.82) 


Sprti 
Spri2 


11.70 
9.86 


(6.86) 
(5.79) 


12. 38 
25. 25 


4. 3 (0.61) 
15. 1 (0.43) 


Max. 


22.78 


(0.93) 


13. 37 


2, 0 (0.89^ 


Bayes(Low) 
B.'»yes(Med) 
Bayes(High) 


22.80 
22.28 
20.86 


(0.76) 
(1.65) 
(2.72) 


6. 44 
8. 42 
12. 38 


1. 6 (0.55) 
0. 3 (0.82) 
3. 2 (0.72) 



Maximum likelihood decision strategy had the smallest total 
number of misclassified students into miiSlcry category and non- 
mastery category. This strategy .issigned 13 students into niasiery 
category. Two of the 13 students were misclassified. Bayesian 
decision strategy assuming low prior ability assigned only 6 students 
into mnstery category and only one incorrectly classified. NunUier of 
Students assigned into mastery category increased when medium or 
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high prior ability belief were assumed* For Bayesian decision 
startcgy with medium prior ability belief, the number of mastery 
students were 8 and they were all correctly classified. For high 
ability level assumption the number c*" students in mastery category 
was 12. Three of the 12 studenjs were misclassified into the mastery 
category. The average number of items used by maximum and 
Bayesian strategies were 23 and 22. respectively. 

Table 3 presents the correlations between total test scores 
(total) and the number of correct scores in percentage obtained from 
flexilevel test. Sprti and Sprt2. For maximum likelihood and 
Bayesian decision starategles correlations were computed between 
total score and the obtained estimated ability. For the predictive test, 
correlations were computed between total test scores and final 
predictive probabilities. 

Table 3: Conelations between total test scores, the final 

estimate of ability scores, predictive probabilities or 
percentage correct scores 



Total PredLPredMPrcdH Flex SprtlSprt2 Max BayLBayMBayH 



Total 1.00 


0.82 


0.87 


0.77 


0.89 0.63 0.71 0.76 0.87 0.88 0.86 


PredL 
PredM 
PredH 


1.00 


0.89 
1.00 


0.81 
0.84 
1.00 


0.91 0.47 0.65 0.64 0.63 0.67 0.70 
0.91 0.51 0.62 0.66 0.69 0.70 0.74 
0.84 0.52 0.68 0.59 0.62 0.64 0.66 


Flex 








1.00 0.54 0.62 0.66 0.72 0.77 0.76 


Sprti 
Sprt2 








1.00 0.89 0.41 0.51 0.54 0A9 
1.00 0.47 0.55 0.57 0.53 


Max 








1.00 0.75 0.78 0.76 


BayesL 
BayesM 
BayeSH 








1.00 0.96 0.85 
1.00 0.89 
1.00 



The correlations between total test scores and pred(Iow)« 
pred(med). pred(high) are highly comparable with those of 
maximum likelihood and Bayesian strategies. The eoaelation 
coefficients between predictive tests and complete test scores are 
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higher than ihc correlations between the complete test scores and 
both Wald's sequential tests, and the correlation between the 
complete test scores and the maximum likelihood strategy. It should 
be noted that the average total test score of 50 students in 
percentage was 47 and the median score was 42. Therefore, low 
ability or medium ability 

assumptions were more appropriate than high ability assumption 
This is probably the reason why the correlations between the 
complete test and predictive test assuming high ability is relatively 
lower. This interpretation may not be applicable to Bayesian 
Strategies, however. 

SUMMARY 

The results presented in the previous sections are basea on data 
obtained from preoiction analysis. Lord's flexilevel test. Wald's 
sequential test. Bayesian. and maximum likelihood strategies. 
MicroCai (Assessment Systems Corporation, 1987) was used for 
simulations of adaptive testing involving Bayesian and maximum 
likelihood stategies. Findings of this study m^y be summarized a", 
follows: 

(a) The final predictive probabilities obtained from predictive 
analysis are significantly correlated with the total scores. These 
correlations are highly comparable with the correlations between the 
total test scores and the other strategies. 

(b) In terms of the proportion of misclassification into the 
mastery or non-masteiy categories, the predicted adaptive testing 
perform better than that of sequential probability tests and almost 
equally well in comparing with Bayesian strategy. 

(c) The number of items required is almost the same as the 
number required by flexilevel test, maximum likelihood and 
Bayesian strategies. Probably because the number of items in the 
total test is too small, the benefit of adaptive lest could not be 
demosirated. 

(d) There is no effect on the number of misclassification of 
students into categories when different priors were used in 
predictive testing. But. in Bayesian decisions, the use of different 
prior distributions may produce different numbers of 
misclassificattons. 
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